This report analyzes metastatic melanoma omic data from MetMap project, as published in “A metastasis map of human cancer cell lines” by Jin, et al 2020., and is described in further detail on the MetMap website. The goals of this analysis are:
Here is a short explanation of some of the features that we’re about to dive into…
DNA barcode abundance detected in each organ relative to the pre-injected population, presented on a log10 scale, range from -4 to 4.
<= -4: non-metastatic
-4~-2: (weakly) metastatic, but with low confidence
>= -2: metastatic, with higher confidence
Percentage of animals that the cell lines were detected via barcode sequencing, ranges 0 to 1
NOTE: The above definitions were copied verbatim from the MetMap website.
Formula
\[ \text{Mean penetrance}_{\text{cell line X}} \;=\; \frac{\text{penetrance}_{\text{brain}} + \text{penetrance}_{\text{lung}} + \text{penetrance}_{\text{liver}} + \text{penetrance}_{\text{bone}} + \text{penetrance}_{\text{kidney}}}{5} \]
Formula
\[ \text{Relative Penetrance}_{\text{organ}} \;=\; \frac{\text{penetrance}_{\text{organ}}} {\text{Mean penetrance across organs}} \]
This ratio quantifies how specifically a cell line targets a given organ relative to its overall metastatic capability.
Define tropism based on the relative penetrance:
| Relative penetrance | Tropism interpretation |
|---|---|
| \(\ge 1.5\) | Strong organ-specific tropism |
| \(\ge 1.25\) and < 1.5 | Moderate organ-specific tropism |
| < 1.25 | No clear organ-specific tropism (generalist or minimal) |
We read, merge and process the following files:
Supplementary Table 03 MetMap cell line annotation.xlsx,
containing the “metadata” describing all cell lines in this studySupplementary Table 04 MetMap 500 met potential.xlsx,
organ-specific cell line metastatic scoresThese files were downloaded from the Metmap download page.
read_excel("./data/metmap_data/Supplementary Table 03 MetMap cell line annotation.xlsx") %>%
dplyr::filter(cancer_type == "melanoma") %>%
as.data.frame() -> metmap_melanoma_metadata
## get sheet names
sheet_names <- excel_sheets("./data/metmap_data/Supplementary Table 04 MetMap 500 met potential.xlsx")
dplyr::bind_rows(
## brain
read_excel("./data/metmap_data/Supplementary Table 04 MetMap 500 met potential.xlsx",
sheet = 1) %>%
dplyr::rename(CCLE_name = names(.)[1]) %>%
dplyr::mutate(organ = gsub("metp500\\.", "", sheet_names[1])),
## lung
read_excel("./data/metmap_data/Supplementary Table 04 MetMap 500 met potential.xlsx",
sheet = 2) %>%
dplyr::rename(CCLE_name = names(.)[1]) %>%
dplyr::mutate(organ = gsub("metp500\\.", "", sheet_names[2])),
## liver
read_excel("./data/metmap_data/Supplementary Table 04 MetMap 500 met potential.xlsx",
sheet = 3) %>%
dplyr::rename(CCLE_name = names(.)[1]) %>%
dplyr::mutate(organ = gsub("metp500\\.", "", sheet_names[3])),
## bone
read_excel("./data/metmap_data/Supplementary Table 04 MetMap 500 met potential.xlsx",
sheet = 4) %>%
dplyr::rename(CCLE_name = names(.)[1]) %>%
dplyr::mutate(organ = gsub("metp500\\.", "", sheet_names[4])),
## kidney
read_excel("./data/metmap_data/Supplementary Table 04 MetMap 500 met potential.xlsx",
sheet = 5) %>%
dplyr::rename(CCLE_name = names(.)[1]) %>%
dplyr::mutate(organ = gsub("metp500\\.", "", sheet_names[5]))) %>%
dplyr::filter(CCLE_name %in% metmap_melanoma_metadata$CCLE_name) %>%
as.data.frame() -> metmap_combined
## calculate z-scores
# metmap_combined %>%
# dplyr::group_by(organ) %>%
# dplyr::summarize(threshold_z = mean(penetrance) + sd(penetrance)) %>%
# as.data.frame() -> tropic_thresh
## merge metmap numerica and metadata into a single dataframe
metmap_combined %>%
dplyr::left_join(metmap_melanoma_metadata, by = "CCLE_name") %>%
# dplyr::left_join(tropic_thresh, by = "organ") %>%
# dplyr::mutate(
# penetrance_z = (penetrance - mean(penetrance, na.rm = TRUE)) / sd(penetrance, na.rm = TRUE),
# CI.95_z = (CI.95 - mean(CI.95, na.rm = TRUE)) / sd(CI.95, na.rm = TRUE),
# tropism = ifelse(penetrance_z >= threshold_z, "tropic", "metastatic"),
# tropism = ifelse(penetrance == 0, "non-metastatic", tropism),
# organ_tropism = paste0(organ, "_", tropism)) %>%
# group_by(CCLE_name) %>%
dplyr::mutate(mean_penetrance = mean(penetrance)) %>%
ungroup() %>%
dplyr::mutate(
relative_penetrance = penetrance / mean_penetrance,
tropism = case_when(
relative_penetrance >= 1.5 ~ 'strong_tropism',
relative_penetrance >= 1.25 & relative_penetrance < 1.5 ~ 'moderate_tropism',
TRUE ~ 'no_clear_tropism'),
organ_tropism = paste0(organ, "_", tropism)) %>%
dplyr::select(contains("name"), contains("id"), contains("CI."),
contains("penetrance"), #threshold_z,
everything()) %>%
dplyr::select(-cancer_subtype) %>%
as_tibble() -> metmap_merged
## save the converted metadata table ( for Takis)
metmap_merged %>%
writexl::write_xlsx(path = "./data/metmap_melanoma_merged_data.xlsx")
## save as RDS
saveRDS(metmap_merged, file = "./data/metmap_melanoma_merged_data.rds")
We display the merged and processed MetMap data in a dynamic HTML table below:
metmap_merged <- readRDS(file = "./data/metmap_melanoma_merged_data.rds")
## display table
metmap_merged %>%
dplyr::mutate(across(where(is.numeric), ~ signif(., digits = 3))) %>%
DT::datatable()
Breakdown of cell lines by organ tropism:
table(metmap_merged$organ_tropism) %>%
as.data.frame() %>%
dplyr::rename(organ_tropic_class = names(.)[1],
count = names(.)[2]) %>%
knitr::kable()
| organ_tropic_class | count |
|---|---|
| bone_moderate_tropism | 2 |
| bone_no_clear_tropism | 28 |
| bone_strong_tropism | 10 |
| brain_moderate_tropism | 3 |
| brain_no_clear_tropism | 29 |
| brain_strong_tropism | 8 |
| kidney_moderate_tropism | 6 |
| kidney_no_clear_tropism | 26 |
| kidney_strong_tropism | 8 |
| liver_moderate_tropism | 2 |
| liver_no_clear_tropism | 20 |
| liver_strong_tropism | 18 |
| lung_no_clear_tropism | 19 |
| lung_strong_tropism | 21 |
metmap_merged %>%
dplyr::select(CCLE_name, CI.95, organ) %>%
tidyr::pivot_wider(
names_from = organ,
values_from = CI.95) %>%
tibble::column_to_rownames("CCLE_name") %>%
as.data.frame() -> df_meta
metmap_scaled1 <- t(scale(t(df_meta))) # scale rows, not columns
metmap_scaled1[is.nan(metmap_scaled1)] <- 0
pheatmap(metmap_scaled1,
fontsize_row = 6,
cluster_rows = TRUE,
cluster_cols = TRUE,
color = colorRampPalette(c("blue", "white", "red"))(100),
main = "Z-scored CI.95 Metastatic Potential")
unique(metmap_merged$CCLE_name)
#> [1] "HS944T_SKIN" "MELHO_SKIN" "HT144_SKIN" "SH4_SKIN"
#> [5] "G361_SKIN" "COLO679_SKIN" "A375_SKIN" "IGR37_SKIN"
#> [9] "A2058_SKIN" "IGR1_SKIN" "A101D_SKIN" "HS939T_SKIN"
#> [13] "RVH421_SKIN" "HS852T_SKIN" "SKMEL30_SKIN" "COLO792_SKIN"
#> [17] "LOXIMVI_SKIN" "CHL1_SKIN" "CJM_SKIN" "WM983B_SKIN"
#> [21] "COLO783_SKIN" "MEWO_SKIN" "K029AX_SKIN" "MELJUSO_SKIN"
#> [25] "COLO800_SKIN" "IPC298_SKIN" "WM88_SKIN" "COLO741_SKIN"
#> [29] "UACC257_SKIN" "SKMEL5_SKIN" "SKMEL24_SKIN" "WM793_SKIN"
#> [33] "UACC62_SKIN" "MALME3M_SKIN" "MDAMB435S_SKIN" "HS294T_SKIN"
#> [37] "WM1799_SKIN" "WM2664_SKIN" "SKMEL2_SKIN" "SKMEL3_SKIN"
metmap_merged %>%
dplyr::select(CCLE_name, penetrance, organ) %>%
tidyr::pivot_wider(
names_from = organ,
values_from = penetrance) %>%
tibble::column_to_rownames("CCLE_name") %>%
as.data.frame() -> df_pen
metmap_scaled2 <- t(scale(t(df_pen))) # scale rows, not columns
metmap_scaled2[is.nan(metmap_scaled2)] <- 0
pheatmap(metmap_scaled2,
fontsize_row = 6,
cluster_rows = TRUE,
cluster_cols = TRUE,
color = colorRampPalette(c("blue", "white", "red"))(100),
main = "Z-scored Penetrance")
unique(metmap_merged$CCLE_name)
#> [1] "HS944T_SKIN" "MELHO_SKIN" "HT144_SKIN" "SH4_SKIN"
#> [5] "G361_SKIN" "COLO679_SKIN" "A375_SKIN" "IGR37_SKIN"
#> [9] "A2058_SKIN" "IGR1_SKIN" "A101D_SKIN" "HS939T_SKIN"
#> [13] "RVH421_SKIN" "HS852T_SKIN" "SKMEL30_SKIN" "COLO792_SKIN"
#> [17] "LOXIMVI_SKIN" "CHL1_SKIN" "CJM_SKIN" "WM983B_SKIN"
#> [21] "COLO783_SKIN" "MEWO_SKIN" "K029AX_SKIN" "MELJUSO_SKIN"
#> [25] "COLO800_SKIN" "IPC298_SKIN" "WM88_SKIN" "COLO741_SKIN"
#> [29] "UACC257_SKIN" "SKMEL5_SKIN" "SKMEL24_SKIN" "WM793_SKIN"
#> [33] "UACC62_SKIN" "MALME3M_SKIN" "MDAMB435S_SKIN" "HS294T_SKIN"
#> [37] "WM1799_SKIN" "WM2664_SKIN" "SKMEL2_SKIN" "SKMEL3_SKIN"
metmap_merged %>%
dplyr::select(CCLE_name, organ_tropism) %>%
dplyr::mutate(value = 1) %>%
tidyr::pivot_wider(
names_from = organ_tropism,
values_from = value,
values_fill = NA) %>%
tidyr::pivot_longer(
cols = -CCLE_name,
names_to = "organ_tropism",
values_to = "value") %>%
dplyr::mutate(
tropism_level = case_when(
str_detect(organ_tropism, "tropic") ~ "tropic",
str_detect(organ_tropism, "non-metastatic") ~ "non-metastatic",
str_detect(organ_tropism, "metastatic") ~ "metastatic",
TRUE ~ NA_character_),
fill_color = ifelse(is.na(value), "NA", tropism_level)) %>%
ggplot(aes(x = organ_tropism, y = CCLE_name, fill = fill_color)) +
geom_tile(color = "white") +
scale_fill_manual(
values = c(
"tropic" = "orangered",
"metastatic" = "limegreen",
"non-metastatic" = "steelblue",
"NA" = "gray90"),
name = "Tropism") +
theme_minimal() +
theme(
axis.text.x = element_text(angle = 45, hjust = 1),
axis.text.y = element_text(size = 6),
panel.grid = element_blank(),
axis.ticks = element_blank()) +
labs(
title = "Organ Tropism by Cell Line",
x = "Organ Tropism",
y = "Cell Line")
metmap_merged %>%
ggplot(aes(x = organ, y = CI.95, fill = organ)) +
geom_violin() +
geom_jitter(width = 0.2, alpha = 0.5) +
geom_hline(yintercept = -2, color = "red", linetype = "dashed") +
theme_minimal() +
labs(
title = "Distribution of Metastatic Potential (CI.95) Across Organs",
x = "Organ",
y = "CI.95") +
theme(legend.position = "none")
metmap_merged %>%
ggplot(aes(x = organ, y = CI.95, fill = organ)) +
geom_boxplot() +
geom_jitter(width = 0.2, alpha = 0.5) +
geom_hline(yintercept = -2, color = "red", linetype = "dashed") +
theme_minimal() +
labs(
title = "Distribution of Metastatic Potential (CI.95) Across Organs",
x = "Organ",
y = "CI.95") +
theme(legend.position = "none")
metmap_merged %>%
ggplot(aes(x = organ, y = penetrance, fill = organ)) +
geom_violin() +
geom_jitter(width = 0.2, alpha = 0.5) +
theme_minimal() +
labs(
title = "Distribution of Penetrance Across Organs",
x = "Organ",
y = "Penetrance") +
theme(legend.position = "none")
metmap_merged %>%
ggplot(aes(x = organ, y = penetrance, fill = organ)) +
geom_boxplot() +
geom_jitter(width = 0.2, alpha = 0.5) +
theme_minimal() +
labs(
title = "Distribution of Penetrance Across Organs",
x = "Organ",
y = "Penetrance") +
theme(legend.position = "none")
metmap_merged %>%
dplyr::arrange(desc(CI.95)) %>%
dplyr::mutate(rank = row_number()) %>%
ggplot(aes(x = rank, y = CI.95, fill = organ)) +
geom_col() +
geom_hline(yintercept = -2, color = "red", linetype = "dashed") +
theme_classic() +
labs(title = "Waterfall Plot of Metastatic Potential (CI.95)",
x = "Ranked Cell Line Samples",
y = "CI.95 (Metastatic Potential)",
fill = "Organ") +
theme(axis.text.x = element_blank(),
axis.ticks.x = element_blank()) #-> p1
# plotly::ggplotly(p1)
#p1
metmap_merged %>%
dplyr::arrange(desc(CI.95)) %>%
dplyr::mutate(rank = row_number()) %>%
ggplot(aes(x = rank, y = CI.95, fill = tropism)) +
geom_col() +
geom_hline(yintercept = -2, color = "red", linetype = "dashed") +
theme_classic() +
labs(title = "Waterfall Plot of Metastatic Potential (CI.95)",
x = "Ranked Cell Line Samples",
y = "CI.95 (Metastatic Potential)",
fill = "Organ") +
theme(axis.text.x = element_blank(),
axis.ticks.x = element_blank()) #-> p1
# plotly::ggplotly(p1)
#p1
NOTE: zero penetrance values have changed to 0.025 so that the organ-specific penetrance can be visualized here.
metmap_merged %>%
dplyr::mutate(penetrance = if_else(penetrance == 0, 0.025, penetrance)) %>%
dplyr::arrange(penetrance) %>%
dplyr::mutate(rank = row_number()) %>%
ggplot(aes(x = rank, y = penetrance, fill = organ)) +
geom_col() +
theme_classic() +
labs(title = "Waterfall Plot of Penetrance",
x = "Ranked Cell Line Samples",
y = "Penetrance",
fill = "Organ") +
theme(axis.text.x = element_blank(),
axis.ticks.x = element_blank()) #-> p2
# plotly::ggplotly(p2)
#p2
NOTE: zero penetrance values have changed to 0.05 so that the organ-specific penetrance can be visualized here.
metmap_merged %>%
dplyr::mutate(penetrance = if_else(penetrance == 0, 0.025, penetrance)) %>%
dplyr::arrange(penetrance) %>%
dplyr::mutate(rank = row_number()) %>%
ggplot(aes(x = rank, y = penetrance, fill = tropism)) +
geom_col() +
theme_classic() +
labs(title = "Waterfall Plot of Penetrance",
x = "Ranked Cell Line Samples",
y = "Penetrance",
fill = "Organ") +
theme(axis.text.x = element_blank(),
axis.ticks.x = element_blank()) #-> p2
# plotly::ggplotly(p2)
#p2
metmap_merged %>%
ggplot(aes(x = CI.95, y = penetrance, color = organ)) +
geom_point(alpha = 0.7) +
geom_vline(xintercept = -2, color = "red", linetype = "dashed") +
geom_smooth(method = "lm", se = FALSE, linetype = "dashed") +
# facet_wrap(~ organ) +
theme_minimal() +
stat_cor(method = "spearman") +
labs(
title = "Correlation between Metastatic Potential (CI.95) and Penetrance",
x = "Metastatic Potential (CI.95)",
y = "Penetrance")
metmap_merged %>%
ggplot(aes(x = CI.95, y = penetrance)) +
geom_point(alpha = 0.7) +
geom_vline(xintercept = -2, color = "red", linetype = "dashed") +
geom_smooth(method = "lm", se = TRUE, linetype = "dashed") +
# facet_wrap(~ organ) +
theme_minimal() +
stat_cor(method = "spearman") +
labs(
title = "Correlation between Metastatic Potential (CI.95) and Penetrance",
x = "Metastatic Potential (CI.95)",
y = "Penetrance") #-> p4
# plotly::ggplotly(p4)
metmap_merged %>%
ggplot(aes(x = CI.95, y = penetrance, colour = tropism)) +
geom_point(alpha = 0.7) +
geom_vline(xintercept = -2, color = "red", linetype = "dashed") +
# geom_smooth(method = "lm", se = FALSE, linetype = "dashed") +
# facet_wrap(~ organ) +
theme_minimal() +
# stat_cor(method = "spearman") +
labs(
title = "Correlation between Metastatic Potential (CI.95) and Penetrance",
x = "Metastatic Potential (CI.95)",
y = "Penetrance") #-> p4
# plotly::ggplotly(p4)
metmap_merged %>%
ggplot(aes(x = CI.95, y = penetrance, colour = tropism)) +
geom_point(alpha = 0.7) +
geom_vline(xintercept = -2, color = "red", linetype = "dashed") +
# geom_smooth(method = "lm", se = FALSE, linetype = "dashed") +
# facet_wrap(~ organ) +
theme_minimal() +
# stat_cor(method = "spearman") +
labs(
title = "Correlation between Metastatic Potential (CI.95) and Penetrance",
x = "Metastatic Potential (CI.95)",
y = "Penetrance") #-> p4
# plotly::ggplotly(p4)
Below we create faceted scatter plots with regression line for each
organ. The vertical red dashed line at
Metastatic Potential = -2 represents where cell lines are
considered “metastatic” or not.
metmap_merged %>%
ggplot(aes(x = CI.95, y = penetrance, color = organ)) +
geom_point(alpha = 0.7) +
geom_vline(xintercept = -2, color = "red", linetype = "dashed") +
geom_smooth(method = "lm", se = FALSE, linetype = "dashed") +
facet_wrap(~ organ) +
theme_minimal() +
stat_cor(method = "spearman", label.x = -4, label.y = 1.05, size = 3.5) +
labs(
title = "Correlation between Metastatic Potential (CI.95) and Penetrance",
x = "Metastatic Potential (CI.95)",
y = "Penetrance")
organ_levels <- c("brain", "lung", "liver", "bone", "kidney")
n_organ <- length(organ_levels)
# Angles at which each petal is centered
angle_centers <- seq(0, 2 * pi - 2 * pi / n_organ, length.out = n_organ)
petal_max_width <- 2 * pi / n_organ # Max width (no more than 1/5 of circle)
petal_min_width <- 0 # petal_max_width * 0.1 # Set a minimum width for aesthetics
metmap_merged %>%
dplyr::mutate(
organ = factor(organ, levels = organ_levels),
organ_index = as.integer(organ) - 1,
angle_center = angle_centers[organ_index + 1], # R is 1-based
penetrance = pmax(penetrance, 0.001),
petal_width = petal_min_width + (petal_max_width - petal_min_width) * penetrance) %>%
dplyr::mutate(
angle_start = angle_center - petal_width / 2,
angle_end = angle_center + petal_width / 2,
r0 = 0,
scaled_r = rescale(CI.95, to = c(0, 1))) %>%
ggplot() +
ggforce::geom_circle(aes(x0 = 0, y0 = 0, r = 1), color = "gray40",
linetype = "dashed", inherit.aes = FALSE) +
geom_arc_bar(
aes(
x0 = 0, y0 = 0,
r0 = r0, r = scaled_r,
start = angle_start, end = angle_end,
fill = organ),
color = "black", alpha = 0.85) +
coord_fixed() +
facet_wrap(~CCLE_name, ncol = 5) +
theme_minimal() +
theme(
strip.text = element_text(size = 8),
axis.title = element_blank(),
axis.text = element_blank(),
axis.ticks = element_blank(),
panel.grid = element_blank()) +
labs(
title = "Rose plots of MetMap melanoma cell lines by metastatized organ",
fill = "Organ")
Test/Approach: Hierarchical clustering or UMAP
Purpose: Reveal whether certain cell lines cluster by metastatic behavior across organs.
umap_result <- umap(metmap_scaled1)
umap_df <- as.data.frame(umap_result$layout)
umap_df$CCLE_name <- rownames(df_meta)
ggplot(umap_df, aes(x = V1, y = V2, label = CCLE_name)) +
geom_point(alpha = 0.7) +
geom_text_repel(size = 3, max.overlaps = 50) +
theme_minimal() +
labs(title = "UMAP of Cell Lines by Metastatic Potential",
x = "UMAP1", y = "UMAP2")
umap_result <- umap(metmap_scaled2)
umap_df <- as.data.frame(umap_result$layout)
umap_df$CCLE_name <- rownames(df_meta)
ggplot(umap_df, aes(x = V1, y = V2, label = CCLE_name)) +
geom_point(alpha = 0.7) +
geom_text_repel(size = 3, max.overlaps = 50) +
theme_minimal() +
labs(title = "UMAP of Cell Lines by Penetrance",
x = "UMAP1", y = "UMAP2")
rownames(metmap_scaled1) <- paste0("CI.95_", rownames(metmap_scaled1))
rownames(metmap_scaled2) <- paste0("pen_", rownames(metmap_scaled2))
joint_matrix <- cbind(metmap_scaled1, metmap_scaled2)
joint_scaled <- scale(joint_matrix)
umap_result <- umap(joint_scaled)
umap_df <- as.data.frame(umap_result$layout)
umap_df$CCLE_name <- rownames(df_meta)
ggplot(umap_df, aes(x = V1, y = V2, label = CCLE_name)) +
geom_point(alpha = 0.7) +
geom_text_repel(size = 3, max.overlaps = 50) +
theme_minimal() +
labs(title = "Joint UMAP of Cell Lines by Penetrance and Metastatic Potential",
x = "UMAP1", y = "UMAP2")
# joint_scaled
# Perform PCA using prcomp()
pca_result <- prcomp(joint_scaled, center = TRUE, scale. = TRUE)
# Summary of variance explained
# summary(pca_result)
# Access principal components
pca_scores <- as.data.frame(pca_result$x)
pca_scores$CCLE_name <- rownames(df_meta)
ggplot(pca_scores, aes(x = PC1, y = PC2, label = CCLE_name)) +
geom_point(alpha = 0.7) +
geom_text_repel(size = 3, max.overlaps = 50) +
theme_minimal() +
labs(
title = "PCA of Numeric Data",
x = "Principal Component 1",
y = "Principal Component 2"
)
We will run various statistical tests to determine the nature of the relationships between metastatic potential, penetrance and organ in melanoma cell lines. The following is a little background on the statistical tests that we will perform:
Kruskal-Wallis Test (Global Non-Parametric Test)
Purpose: To test whether there is a statistically significant difference in the distribution of a continuous variable (like CI.95 or penetrance) across more than two independent groups (e.g., organs: brain, liver, lung, etc.).
Why use Kruskal-Wallis instead of ANOVA?
The variable (e.g., CI.95) is log-transformed and likely non-normally distributed.
Kruskal-Wallis is a non-parametric test and does not assume normality or equal variance.
What it tells you: At least one group differs from the others — but not which one(s).
Dunn’s Test (Post-hoc Pairwise Comparisons)
Purpose: To identify which specific pairs of groups differ after a Kruskal-Wallis test has found a significant global difference.
Why not use multiple Wilcoxon tests directly?
Running many pairwise tests inflates the Type I error rate (false positives).
Bonferroni Correction (Control for Multiple Testing)
Why needed: We’re doing multiple pairwise comparisons (e.g., brain vs. liver, brain vs. lung, etc.).
What it does: Adjusts the p-value threshold to keep the family-wise error rate controlled.
For example, if you do 10 tests, Bonferroni adjusts the significance threshold from 0.05 to 0.005.
Effect: More conservative (reduces false positives, may increase false negatives).
# Global correlation
cor_test_global <- cor.test(metmap_merged$CI.95, metmap_merged$penetrance, method = "spearman")
print(cor_test_global)
#>
#> Spearman's rank correlation rho
#>
#> data: metmap_merged$CI.95 and metmap_merged$penetrance
#> S = 184049, p-value < 2.2e-16
#> alternative hypothesis: true rho is not equal to 0
#> sample estimates:
#> rho
#> 0.86196
cor_results_per_organ <- metmap_merged %>%
dplyr::group_by(organ) %>%
dplyr::summarize(
spearman_rho = cor(CI.95, penetrance, method = "spearman"),
p_value = cor.test(CI.95, penetrance, method = "spearman")$p.value)
print(cor_results_per_organ)
#> # A tibble: 5 × 3
#> organ spearman_rho p_value
#> <chr> <dbl> <dbl>
#> 1 bone 0.920 4.65e-17
#> 2 brain 0.867 4.54e-13
#> 3 kidney 0.899 3.66e-15
#> 4 liver 0.915 1.36e-16
#> 5 lung 0.825 6.06e-11
Test: Kruskal-Wallis test (non-parametric)
Why: CI.95 is continuous, log-transformed, and likely non-normal. We're comparing distributions across multiple organs.
Post-hoc: Dunn's test with Bonferroni correction.
# Kruskal-Wallis test for CI.95 across organs
kruskal_result <- kruskal.test(CI.95 ~ organ, data = metmap_merged)
print(kruskal_result)
#>
#> Kruskal-Wallis rank sum test
#>
#> data: CI.95 by organ
#> Kruskal-Wallis chi-squared = 2.4835, df = 4, p-value = 0.6476
# Dunn's test with Bonferroni correction
dunn_result <- dunnTest(CI.95 ~ organ, data = metmap_merged, method = "bonferroni")
print(dunn_result)
#> Comparison Z P.unadj P.adj
#> 1 bone - brain -0.43271014 0.6652254 1
#> 2 bone - kidney -1.24017817 0.2149095 1
#> 3 brain - kidney -0.80746803 0.4193969 1
#> 4 bone - liver -0.71957378 0.4717875 1
#> 5 brain - liver -0.28686364 0.7742167 1
#> 6 kidney - liver 0.52060439 0.6026424 1
#> 7 bone - lung -1.33097003 0.1831989 1
#> 8 brain - lung -0.89825989 0.3690470 1
#> 9 kidney - lung -0.09079186 0.9276580 1
#> 10 liver - lung -0.61139625 0.5409373 1
Test: Kruskal-Wallis or ANOVA (depending on distribution)
Alt: Chi-squared or Fisher's exact test if binarized penetrance (e.g. high ≥ 0.5, low < 0.5).
# Kruskal-Wallis test across organs
kruskal_penetrance <- kruskal.test(penetrance ~ organ, data = metmap_merged)
print(kruskal_penetrance)
#>
#> Kruskal-Wallis rank sum test
#>
#> data: penetrance by organ
#> Kruskal-Wallis chi-squared = 13.725, df = 4, p-value = 0.008228
# Dunn's test with Bonferroni correction
dunn_result <- dunnTest(penetrance ~ organ, data = metmap_merged, method = "bonferroni")
print(dunn_result)
#> Comparison Z P.unadj P.adj
#> 1 bone - brain 0.2600872 0.794796486 1.00000000
#> 2 bone - kidney -0.5338120 0.593471611 1.00000000
#> 3 brain - kidney -0.7938992 0.427254097 1.00000000
#> 4 bone - liver -1.6423486 0.100517781 1.00000000
#> 5 brain - liver -1.9024358 0.057114201 0.57114201
#> 6 kidney - liver -1.1085366 0.267630143 1.00000000
#> 7 bone - lung -2.9106391 0.003606903 0.03606903
#> 8 brain - lung -3.1707264 0.001520583 0.01520583
#> 9 kidney - lung -2.3768272 0.017462269 0.17462269
#> 10 liver - lung -1.2682906 0.204694209 1.00000000
Session Info
sessionInfo()
#> R version 4.5.0 (2025-04-11)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.2 LTS
#>
#> Matrix products: default
#> BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.12.0
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0 LAPACK version 3.12.0
#>
#> locale:
#> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
#> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
#> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
#>
#> time zone: Europe/Brussels
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] grid stats graphics grDevices utils datasets methods
#> [8] base
#>
#> other attached packages:
#> [1] DT_0.33 RColorBrewer_1.1-3 knitr_1.50 umap_0.2.10.0
#> [5] ggpubr_0.6.0 plotly_4.10.4 pheatmap_1.0.13 FSA_0.10.0
#> [9] scales_1.4.0 ggforce_0.4.2 ggvenn_0.1.10 readr_2.1.5
#> [13] writexl_1.5.4 readxl_1.4.5 ggrepel_0.9.6.9999 ggplot2_3.5.2
#> [17] stringr_1.5.1 tibble_3.3.0 tidyr_1.3.1 dplyr_1.1.4
#>
#> loaded via a namespace (and not attached):
#> [1] tidyselect_1.2.1 viridisLite_0.4.2 farver_2.1.2 fastmap_1.2.0
#> [5] lazyeval_0.2.2 tweenr_2.0.3 digest_0.6.37 lifecycle_1.0.4
#> [9] magrittr_2.0.3 compiler_4.5.0 rlang_1.1.6 sass_0.4.10
#> [13] tools_4.5.0 utf8_1.2.6 yaml_2.3.10 data.table_1.17.4
#> [17] ggsignif_0.6.4 askpass_1.2.1 labeling_0.4.3 htmlwidgets_1.6.4
#> [21] reticulate_1.42.0 abind_1.4-8 withr_3.0.2 purrr_1.0.4
#> [25] polyclip_1.10-7 dunn.test_1.3.6 MASS_7.3-65 dichromat_2.0-0.1
#> [29] cli_3.6.5 rmarkdown_2.29 generics_0.1.4 rstudioapi_0.17.1
#> [33] RSpectra_0.16-2 httr_1.4.7 tzdb_0.5.0 cachem_1.1.0
#> [37] splines_4.5.0 cellranger_1.1.0 vctrs_0.6.5 Matrix_1.7-3
#> [41] jsonlite_2.0.0 carData_3.0-5 car_3.1-3 hms_1.1.3
#> [45] rstatix_0.7.2 Formula_1.2-5 crosstalk_1.2.1 jquerylib_0.1.4
#> [49] glue_1.8.0 stringi_1.8.7 gtable_0.3.6 pillar_1.10.2
#> [53] htmltools_0.5.8.1 openssl_2.3.3 R6_2.6.1 evaluate_1.0.3
#> [57] lattice_0.22-7 png_0.1-8 backports_1.5.0 broom_1.0.8
#> [61] bslib_0.9.0 Rcpp_1.0.14 nlme_3.1-168 mgcv_1.9-3
#> [65] xfun_0.52 pkgconfig_2.0.3